mean reward
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- North America > United States > California > San Mateo County > San Mateo (0.04)
- (3 more...)
- North America > United States > Illinois > Cook County > Chicago (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Kosovo > District of Gjilan > Kamenica (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (2 more...)
Appendix A Lower Bound In this section, we establish a lower bound on the expected regret of any algorithm for our multi-agent
Our goal in this section is twofold. To avoid excessive repetition of notation and proof arguments, we purposefully leave this section not self-contained and only outline these adjustments needed. We refer an interested reader to the work of Auer et al. We leave it to future work to optimize the dependence on N and K . In our MA-MAB problem, an algorithm is allowed to "pull" a distribution over the arms In the proof of their Lemma A.1, in the explanation of their Equation (30), they cite the assumption that given the rewards observed in the first Finally, in the proof of their Theorem A.2, they again consider the probability Thus, we have the following lower bound.
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (5 more...)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.94)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.95)
From Finite to Countable-Armed Bandits
We consider a stochastic bandit problem with countably many arms that belong to a finite set of types, each characterized by a unique mean reward. In addition, there is a fixed distribution over types which sets the proportion of each type in the population of arms. The decision maker is oblivious to the type of any arm and to the aforementioned distribution over types, but perfectly knows the total number of types occurring in the population of arms.
- North America > United States (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)